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From: W.I. Wells 

Date: July 21, 1953 

Abstract: A particular solution of the optimum filter design is presented 

for the class of problems where the noise is normally distributed 
and the data is described by a normal distribution. This is 
an application of the general results obtained in previous 
memos.This particular problem lends itself to an analytic 
solution and in fact turns out to yield a linear filter. This 
linear filter has long been used in tracking problems but it 
is believed that this is the first rigourous derivation. Three 
examples are given which represent the complete range of character¬ 
istics of the filter and one is able from them to make statements 
about the necessary "memory" of the filter. One is also able 
to draw conclusions about hew often one should sample a function 
based upon the irregularity of the function and on the noise 
that contaminates the measurements. 


I. INTRODUCTION 


As a direct application of the general concepts set forth in 
previous reports^-* 2, a problem is presented which not only illustrates 
many of the important general points, but fortunately, leads to an exact 
analytical solution. Under certain circumstances this problem may have 
important practical applications, but for the present it will be treated 
as a purely hypothetical case. 

We will describe the problem in the statistical sense and then 
present three particular solutions. As a preparation for this it is 
necessary to describe the problem in terms that are readily interpreted 
so as to define the two functions of the filter, detection and selection. 

The problem to be treated is one which occurs in systems that 
are designed to track, in one dimension, on the basis of sampled data. 

The characteristics of the track and of the noise that contaminate j the 
data are given. From these characteristics we are able to compute on 
the basis of the received data the probability distribution functions for 


M-1812,"The Philosophy of Statistical Filter Design," W.I. Wells, Jan. 27, 1953* 
o 

M-1886,"The Specification of an Ideal Detector as a First Step in Filter 
Design," W.I. Wells, March 6, 1953* 
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the expected position and velocity of the track after the reception of data. 
This is the detection problem. 

The selection problem comes in choosing, from this probability 
density distribution function, the value of position and velocity that 
represents the "best" guess of position and velocity under the constraints 
imposed by the desired overall function of the filter. This action of 
selection will be discussed first since its action, in this case, is such 
that we simplify the calculations if it is known. 

We suppose that the probability distribution function has been 
calculated. We now desire to find the spot at which we next expect to find 
the track when the next data arrives. If we are going to examine the 
following data in such a way that only data that is extremely close to 
the predicted position will be used, we will be obliged to look at the 
most probable value. If, however, we are going to ldok for data that may 
be anywhere in the general neighborhood, we may choose another criterion. 

The mathematical statement of this problem is that we wish to center our 
area of search on the probability distribution curve in such a way as to 
cause the product of these two curves to include the most area. Under 
these conditions we will have the greatest probability of "seeing” the 
data that comes in next time. In order to do this process in a mathematical 
way we note that this is Just the convolution integral representation. 

Thus we-construct a function which represents our effective area of search 
and convolve this with the probability distribution function. Then the 
maximum point on the resulting curve is the correct place to put the center 
of area of search in order to maximize the probability of "seeing" the new 
data. 


We note that if the probability distribution curve is symmetrical 
with its highest point as the center and that the curve of search area is 
symmetrical, then the convolution of these two leads to the fact that the 
highest point is where the search area curve is centered on the highest, 
center point, of the distribution curve. If this is the case, one may 
just forget about this convolution operation and look for the center or 
maximum of the probability distribution function. This is the case in 
the problem being reported in this paper. It will be shewn that the 
probability distribution functions come out as normal distributions, and 
since the search area is assumed symmetrical, .we need only ask for the 
highest point on the normal curve as the best place for us to center our 
search area. 

The problem from here on is to do the job of detection, that is, 
to calculate the probability distribution curves. It will be understood 
that the process of selection merely picks out the peak of the distribution 
curve so we will give formulae that do this as part of the following 
calculations. Our function that we expect to filter is X(t). 
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H. THE PROBLEM 

First we shall discuss the characteristics of X(t) . We choose 
to say that the second derivative of X(t) is a constant in the intervals 
between samples. This is an arbitrary representation but as the samples 
become closer together the representation becomes exact. This means that 
the curve X(t) is to be approximated with a sequence of parabolas, each 
connecting two adjacent sample times in such a way that the curve of X(t) 
is smooth, i.e., the curve is continuous at the sample times, with the 
second derivative assuming new values for each interval. We will call the 
first derivative the velocity and the second derivative the acceleration. 
We have not yet completely described the process. For any set of samples 
one may fill in several smooth curves with the above characteristics. 

See Fig. 1. 



The remaining restriction must tell which curve will be used for the 
approximation. We do this by requiring that the probability distribution 
density of the acceleration be a normal distribution centered at zero. 

That is, the smaller accelerations are more likely than the larger ones, 
hence, the approximating curve will be the one with the least violent turns. 

This means in the above figure, (Fig. l), the continuous curve, (A), is 
much more likely than the dashed curve, (B). We write this distribution 
as 

a 2 

e ' 2=< 2 (D 

2 

where (a) is the acceleration, ©< is the variance of the normal distribution. 


W(a) * ———•— 


Next we will discuss the type of noise to be considered. When 
we sampled the curve we make a measurement of the value of X at a certain 
time. If this measurement is exact, there is no noise. We wish to consider 
the particular case where there is some doubt as to the exact value of X. 

In fact we assume? that the exact value of X is normally distributed about 
the measurement value of the sample. Let the n'th sample be called S . 

Then n 

- 

e 2<T S 

2 

where is the variance of the normal distribution. 


W(X) 


/iT<r 


( 2 ) 
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III. SOLUTION 


This completes the characterization of the problem. Next we shall 
show how these two distributions may be used to calculate the probability 
distribution function of predicted X and V. (Detection) 


We begin by assuming that we have received the first sample (S_) 
at t-1. Now we may ask for the probability distribution of X. This is 
obviously 

(x-s ^ 2 

^ (x) • 7^c~ • ^ (3) 


Now, there is a time elapse before the next sample and two effects 
become apparent. First, if there is a first derivative, or velocity (V), 
at t-1, then we can say, if the acceleration (a) is zero, that at t-2 the 
distribution will be the same as for t-1 except that X will be increased 
to X+V. We measure (V) and (a) in terms of the sanple interval so that 
1 (tg-t^)- V. In other words, if a-0 then at t-2 


w 2 (x) 


-(X-V-S^) 2 
e “TF 5 


(k) 


where V is the velocity at t-1. We can imagine this as a distribution 
curve which is a function of both (X) and (?). We plot this on an (X,V) 
plane where the height above the plane is the height of the probability 
density distribution. When the first sample S.. was received, we had a 
plot as follows: 



Fig. 2 
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We see here that the "mound" is uniform in the V coordinate. This means 
that after the first sanple is received, one knows nothing about derivatives 
and all velocities are equilikely. In addition, no matter where we cut 
through, the X distribution is normal about S^. 

Now, if a*0, we can draw the distribution at t»2. We note that 
if the initial velocity is zero, the distribution curve is the same. Thus, 
on the X axis, where V»0, the "mound" remains fixed. If the velocity is 
some other value, the mound shifts to the right a distance V. One sees 
that the whole mound is twisted as follows. 



It is seen to be skewed with respect to the V axis. This means that if 
the position at t*2 is found to be S the most likely value for V will be 
found at 7-S--S-, where this mound has its highest point, which intersects 
the line X -S 2 . 

Now we examine what happens to this distribution if the acceleration 
is not zero. Suppose it is exactly (a). Then we know that in the time 
tg-fci, X will increase to 

X 2 ’ Xi ♦ V i * f (5) 

and V to 

V 2 - ♦ a (6) 
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since (a) is constant in the interval. We ask what this has done to the 
n mound w that we started with in Fig. 2. Obviously if a-0, we get Fig. 3* 
If (a) is some number, the mound retains its same cross-sectional shape 
but is now centered at X-S^+a/2 on the X axis and is parallel to the line 

2V - X-S 1 (7) 

rather than 

V - X-^ (8) 

as in Fig. 3» 

The formal procedure for accomplishing transition from Fig. 3 

to the desired curve is the convolution. When the acceleration is fixed 

at a known value (a ) its distribution function is an impulse. If this 

0 

is thought of as a function of V, we have U (V-a ). U (X) is the unit 

OOO 

impulse function. It is zero everywhere except at X**o, and its integral 
over X is one. Convoling this with a distribution containing V, merely 
replaces V by V-a, which is the desired transformation. Likewise, if 
the impulse is written U Q (2X-a), and convolved with a distribution con¬ 
taining X, we replace X by X-a/2. So we have for the case above 




W 1 (5-V,r)U {> ( <T-X- f)U o ( & -V-a)d 6* d V. 


(9) 


Both steps have been combined here. We note W^( 6” -V, *T) has had X replaced 

by X-V to account for the initial velocity at t*l. (This is the transition 
from Fig. 2 to Fig. 3*) The convolution shifts the curves further to allow 
for the acceleration during the interval. 

It is helpful to think of this process in terms of filters. 

We note that as W(a) is an impulse, we pass the function W^(X-V,V) through 

a n two-dimensional w filter whose impulse response is an impulse, delayed 
(a ) units in the V direction, a 0 units in the X direction. Since the 

° T 

operations of integration and differentiation involved are linear, it 
is proper that when W(a) is different from an impulse, the impulse response 
of the filter will also be different. In fact it will be W(a), also 
shifted (a) units in the V direction and a/2 units in the X direction. 

The needed filter has the impulse response related to W(a) as follows 

Wj[a) --> W£V)W/f) (10) 

We note that in the X,V plane these convolutions all take place along a 
straight line V-2X - const. For convenience we change variables from 
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X,V to W, Z as follows . 



where Z-const. is the equation of V-2X«const. In terms of W, Z we should 
expect that the impulse response of our filter contain only W. The equations 
for the transformed coordinates are: 


$ Z - V-2X 


5 W - 2V*X 
$ V - Z+2W 


( 11 ) 


5 X « -2Z*W 


We note that the filter corresponding to a fixed known acceleration 
a 

tJ n (X- ^S)U (V-a ) becomes 
V c o o 


( -U^2f -a o) B ( &*2W - s o) 

° fT 0 


/r 


( 12 ) 


This has a value only Where both brackets are zero, thus: 

- . o . 

rr it 

which is satisfied only for Z * 0 and the impulse response becomes 




( 13 ) 
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or in general 



,( Ts 


(Ik) 


Now we return to the problem laid out. The distribution of acceleration 
was to be normal. 


W(a) 



(IS) 


Dropping the constant multiplier, we have from Eq. lU, 


*w(w) 

. k($£ 

* e 


Let 

X . h 

6 ^ 

(16) 

W(W) 

- 

• 

m 

6 

(17) 


The normal distribution of time position about observed position may be 
written 

-p(W-2Z-b n ) 2 

W^Z) ^ e (18) 


where 0 « —i—« b. - \f S S n 

10 6* 2 1 1 


(19) 


Now with the notations we may begin the process of constructing the 
probability density curves for W(W,Z) after receiving several pieces of 
data. 


We receive the first sample and at t-1 we have 


W^Z,^) 


, -P(W-2Z-b 1 )' 




* stands for w is proportional to.” 


( 20 ) 
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We replace X by X-V to allow for initial velocities. 
W,Z by replacing 

W by 1/5(3W-Z) 

Z by l/5(7Z*UW) 



This is done in 


( 21 ) 


giving: 


e 




- 6 w 2 


We now convolve this with e 


which gives: 


( 22 ) 


w(w,z,t 2 ) 


(W+3Z+b^) 2 


(23) 


This is the distribution just before the reception of the second sample* 
When the second sample is received we have: 


w(w, z ,t 5 )» • < 4r (w * 3Z * b i )S , ‘ p(w ' JZ ' b2> ‘ 


( 21 ,) 


Again we change variables to allow for initial velocities and convolve 
with e , getting 




(3W*UZ«te )‘ 


w(w,z,t 3 ) - 

* e 


+ $6 (W*3Z*b 2 ) 2 



(25) 


This is the distribution just before the reception of the third sample. 
The values of(W and Z)(X,V) which maximize this expression are the best 
guess at the predicted position and velocity at the third sample. 


From the length of this expression one notes that the algebra 
is becoming very long. Thus we shall attempt to find a recursion formula 
to be used in numerical examples. We note that by further manipulations 
each of the previous expressions can be put in the form: 


_ A 1 (W-fBZ*0 1 ) 2 - A 2 (Z+C 2 ) 2 
e 


( 26 ) 
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Since the expressions retain form we merely choose an appropriate 
beginning and calculate the A,B,C of one time in terms of those at one 
preceding time. 

A convenient spot to begin is where we have just received a 
sample and then changed variables to allow for initial velocities. Then 
we have 






( 27 ) 


where this gives the distribution of (V) and (X+V) after the n-1 sample. 

r«2 

Now we convolve this with “ to get the distribution of the predicted 

X and V. Then we receive the n’th sample and change variables to allow 
for initial velocities. Putting this back in the above form, we have 


^ * A Zn-l i * f> 



B 

n 



* A 2n-1 | ‘ » 

™ln-l * ^n-l °2 1 ». 
*ln 



(28) 

(29) 

(30) 

(31) 


( 32 ) 
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where 


X 


V 

A ta -1 


i+B ♦ 3 
n-l 


7B — -1 
n-l 

-5 


U - B . + 2 
n-l 


(33) 


(»„ 


S_ « n'th sample, 
n 


These are the desired recursion relationships. We may now ask for the 
most probable (W,Z) or (X,V). These are the coordinates of the peak of 
the "mound" that we have. These are found to be. 



T n ’ " 5», 

ff 1 rr 


. (31*) 


■ <V 2) 


(35>) 


X _ - X +V is the predicted position for the following 
n n (n+l) sample. 


c c 

x - (3-B, ) -%! * -%L 

n U /T fi 


(36) 

Somewhat more useful formulae are obtained when 
may be written as follows: 

one notes that 

these equations 


V - V , ♦ M(S - X ) 
n n—i n n 


(37) 


X - X * N(S - X ) 
n n n n 


(38) 

where s 




M - 

(2B ln -l) (XYUp ♦ 

20 

(39) 

^ln ^2n 

*ln 
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N 



) (nop * u/sa^ 

- 



i— 

*ln 


(Uo) 


The size of the distributions are of interest, especially in 
the X direction, since the samples are received in X. To get the unconditional 
distribution of X, we merely integrate W(X,V) over all V. This leaves 
a normal distribution in X of the form: 


(I-m ) 2 

77 ^ 


If we ask for the distribution of X , we have 

n 

Jr 2 _ A ln (3 " B ln* 2 * A 2n 
n 10 ^n A 2n 


(Ul) 


(U2) 


and for % n+1 we have 



(? * v 2 * a * 
^ .- 


10 


A-.X 2n 


m 


These are all of the relations that are needed to solve any particular 
problem. The particular problems are described by giving oi, and S • 

IV. DISCUSSIC® 

First, let us discuss the outstanding characteristics of the 
solution and then present some examples. To actually construct the 
optimum filter we need know Eqs. 37 and 38 , and the appropriate values 
of M and N for the particular data. This filter has an important characteristic 
which is not usually so evident. M and N are functions of the number of 
pieces of data received. In other words, when one first begins to filter 
data, the filter uses one set of values of M and N, then as more and more 
data are received, these values of M and N change, eventually reaching 
a fixed value asymptotically. In many cases the "transient" behavior 
is over very quickly and one looses very little by using the fixed filter 
straight through. On the other hand, there are cases where the final 
form of the filter is useless unless the "transient" is used first. 

The most striking example of this behavior is that where the acceleraiion 
is zero at all times. In this case X(t) is a straight line. The final 
values of M and N are zero. This occurs only alter an infinite number 
of samples have been received. Thus, during the "transient" the filter 
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determines the straight line and then after a long time no more data is 
accepted. In this case the B transient B behavior is the important part. 

One, therefore, sees that we cannot ask for the final values of M and H 
and automatically expect to have a good filter. 

In many cases, however, the transient is relatively short and 
new data is weighted rather heavily and thus only small errors are encountered 
by using a fixed filter with the final values of M and N. Whether this 
procedure is possible depends upon the relative values ofo( and • 

In any case, the procedure for calculating the optimum filter is given 
above• 


The next thing to be considered is the B goodness B of the filter. 
Even though the filter may be "optimum" it may not do enough good to 
warrant its use. Therefore we ask, how much sharpening of the distribution 
curves do we get by using this filter with several pieces of data. Since 
W(X) is a normal curve, its variance 2 gives its sharpness. Eqs. U2,U3 

give $ 2 for the whole family of filters after any number of samples. 

As the number of samples increase, {jf 2 decreases asymptotically to a fixed 
value which represents the B best B that the filter can do with this particular 

a 2 2 

type of datai For a good filter q should be considerably smaller than S' • 

We will now give three simple examples. The first leads to the 
straight line smoothing equations. 

V. EXAMPLE 1. 

Let gK * 0. This says a *.0, thus X(t) is a straight line. 

We receive our first sample and have 

(I-Sj/ 

w(x,v.V - - 

which after a shift due to velocity and change of variable becomes! 

-pCW+32^) 2 

e 

From Eq. 27. fLl " P A 21 " 0 B 1 " 3 °11 * b l C 21 ‘ 0 


From Eq. 16 & ■ -——* * o° 

10 


From Eqs. 

33 

X 

- P X - 3 

Q - k 

U - 5 

Eq. 

28 


- lop 



Eq. 

29 


- 5/2p 



Eq. 

30 

B 2 

« 3/2 
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Eq. 39,1*0 M 2 - 1 N 2 - 1 

Eq. U2 ft 2 - 6 2 2 . 5 (T 3 

2 5 

Now repeating these steps one finds in general 



where n is the number of samples* 
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VI. EXAMPLE 2. 

Let <5~ ” 0, which says there is no noise, hence we should expect 
very little filtering. 

We being again by receiving the first sample, converting to (W,Z) and allowing 
for initial velocities} this gives as before: 

-pCW+32^) 2 

e 

An -00 A a - 0 B 1 -3 G n »b 1 C 21 - 0 

P - —^-5 - 00 

io<r 

x - ^ 1-3 Q-l* U -5 

A 22 -2 Sf Bg « 3 

- 1 N 2 - 1 

^ ° ^j* 2 - 1/2 2 

Sepeating. 

X « £ T - 3 Q - k U = 5 

A^j ■ P A^ - 50 X - 3 

m 3 - 3/2 n 2 - 1 

Jf 3 2 *° ^/- 3 / 8»< 2 

Again: 

A 2U“ 7 ^ V 3 

M^ - 5/3 H 2 - 1 

~ 0 m V3^ 2 
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or in general: 



Fig. 6 


We see that the filter reaches a nearly fixed value after about 6 pieces 
of data. In this case we conclude that the "transient" is fairly short 
and reasonably good results will be obtained by using the filter M»2, N»l, 
from the beginning. Since there is no noise in the data, the smoothed 
positions are obviously not in error. The predicted positions are such 

y 2 2 V 2 2 

that there is an improvement of J * to (f ■ .25> , for the 

final filter. 

Both of the above examples could have been worked by simpler 
means. The third example is a more realistic problem and also the results 
do not come out so nicelyj however, by continuation of the above processes 
we can work this example for as many steps as we desire. 
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VII. EXAMPLE 3. 

Let p - S - 1, thus 2 - 1/10 o( 2 - U/lO . 

This Is the combined case of noise plus non-straight line signals. 
Again we find: 



A^ - p - 1 

M 

« 

O 

B 1 * 3 C 11 ‘ b l 


y 1 

X "2 

Y - 3 Q 

- k U - 5 


A]_2 * n /2 

A 22 " If 

B 2 “ IT 


M 2 -! 

n 2 - 1 



^ 2 2 - 1/10 - 

/f- 2 

^ ( 

2 * 7/10 - 7 6 2 

repeating 

X - 11/13 

y 21 

T ' n 

Q . 23 u ■ li® 

w 11 11 


a 13 - 72/13 

A 23 

B - 29/18 

3 


M 3 - 3 /I 1 

N 2 

7/8 


Y 2 

0 3 - 7/80 - 

2 

7/8 <T 

/ 2 , U7 „ U7 sr 2 

Oj. *B0 " “5 0 

again: 

Y _ 72 

x * 

Y - 17/9 

Q - H U - 65/18 


A . 29 

fu: 5" 

4 « 1375 

2U H93 

- 1:7/29 


M^ - U2/55 

- 1:7/55 


(jf/- U7/550 - 1.7/55 <T 2 / 2 - ^ 2 

5 
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Once more? 



199 

3T 


* - 2 ?U,0$0 

*25 98,107 


B, 


322 

199 



289 

37B 


M . 323 

*5’375 




1189 

155 


2 



n is the number of samples* 


Fig. 7 
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We see for this filter, the trsnsient is over within about U samples. 
Also, we see no great improvement in the smoothed data, i.e., from 

X 1 to .85. The predicted position has a distribution roughly 
^ n 

or 2.U times as wide as the received data. This is to be expected since 

witho( ^ ■ li/10, there is a fairly large possibility of variation from 
a straight line. 


These three examples illustrate the two extreme problems and 
one intermediate one. Any other problem of this type can be worked out 
using the same recursion formulae. 

It should be repeated that for every combination of , <S“ 
one can determine M and N (actually these depend only one^ / S) which 
completely define the linear filter; then one also may calculate the 

^’s which describe the quality of the filter. There is one further 
criterion which could be easily calculated. This is the variance of 
the Unconditional probability distribution of velocity. It was not 
included since it is of secondary importance, when the data involves 
X only. 
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vm. CONCLUSION 

The complete solution of the design of the desired filter is 
now given. This involves the two processes, detection and selection. 

In this particular case the optimum filter turned out to be linear. 

It is conceivable that for the same problem, but for a different search 
area criterion, one would find a filter that is not linear. The detection 
or calculation of the distribution curves would be the same, only the 
selection problem would differ. 

For other problems, in which the distribution curves of the 
acceleration and noise are not normal, one would in theory follow the 
same process to calculate the distribution functions of predicted position 
and velocity. Then an appropriate criterion would be used in the selection 
process. Since the calculation of the distribution curves is not always 
possible by analytic means for the general distribution, it is hoped that 
these problems may be solved by composing the actual distributions from 
sums of appropriately chosen normal functions. Then the distribution 
curves of the predicted quantities will be combinations of stuns of the 
type of solution presented here. 

There are certain extensions that permit further constraints 
on certain of the variables, such as requiring the acceleration distribution 
to be a function of the present velocity, etc. These concepts are very 
clear! the only question is whether solutions of these more general forms 
can be done analytically. It does not seem economically feasible to do 
these calculations with some special analog device, although it is perfectly 
possible conceptually. 


Signed J'. 

W.I. Wells 


Approve 
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